Attention Is All You Need
Attentionを独自拡張し、モデルアーキテクチャTransformerを提案
Self-attention
翻訳はsequence transduction taskの1つ
Abstract
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
「新しい単純なネットワーク構造、Transformerを提案する」
「Transformerは単にattention機構だけに基づき、recurrent networkやconvolution networkを完全になしで済ませる」
機械翻訳タスク
性能で上回った
7.Conclusion
In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.
For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers.
「翻訳タスクにおいて、リカレントレイヤーや畳み込み層に基づくアーキテクチャよりもTransformerは有意に早く訓練できる」
TODO Attention Visualization (Appendix) Figure 3